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Abstract. The effect of target molecule depletion from the supernatant solution 
is incorporated into a physico-chemical model of hybridisation on oligonucleotide 
microarrays. Two possible regimes are identified: local depletion, in which depletion 
by a given probe feature only affects that particular probe, and global depletion, in 
which all features responding to a given target species are affected. Examples are given 
of two existing spike-in data sets experiencing measurable effects of target depletion. 
The first of these, from an experiment by Suzuki et al. using custom built arrays with 
a broad range of probe lengths and mismatch positions, is verified to exhibit local and 
not global depletion. The second dataset, the well known Affymetrix HGU133a latin 
square experiment is shown to be very well explained by a global depletion model. It is 
shown that microarray calibrations relying on Langmuir isotherm models which ignore 
depletion effects will significantly underestimate specific target concentrations. It is 
also shown that a combined analysis of perfect match and mismatch probe signals in 
terms of a simple graphical summary, namely the hook curve method, can discriminate 
between cases of local and global depletion. 
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1. Introduction 

Physico-chemical models describing the processes involved in converting concentrations 
of specific RNA or DNA targets hybridised onto oligonucleotide microarrays to observed 
fluorescence intensities have become commonplace [HI EOl [HI [7J, [5], [lOj HEl [171 HHJ 
EH [TBI EE, El E6]. The ultimate aim of such models has in general been to provide 
biologists with practical algorithms for estimating absolute specific target concentrations 
in the presence of a complex non-specific background from fluorescence intensity data. 
Early models inspired by Langmuir adsorption theory, which applied standard physical 
chemistry to the hybridisation of specific and non-specific targets to the microarray 
surface, predicted a hyperbolic response function [191 EHj which has been verified with 
reasonable accuracy [H] for the Affymetrix U95a Latin Square spike-in experiment [JJ. 
Refinements of the model to include the effects of probe and target folding and bulk 
hybridisation in the supernatant solution [51 [16] maintain the hyperbolic shape of the 
response function while decreasing the effective adsorption rate constant. Including the 
effects of post-hybridisation washing [T5l [21] also maintains the hyperbolic shape of the 
response function and is able to explain an asymptotic response in the limit of high 
target concentrations which is below full saturation of the probe feature and decreases 
with probe-target binding affinity [IB]. 

The above physico-chemical models generally assume that the concentration of 
target molecules in the supernatant solution is not appreciably depleted by the 
hybridisation reaction. However, in order to explain their data from spike in experiments 
which run to very low spike-in concentratons [30], Ono et al. [27] have recently extended 
the accepted adsorption model to include such target depletion effects. Their model 
predicted an interesting saturation effect which was borne out by experiment. As well 
as the usual saturation effect, in which the number of available probe molecules becomes 
exhausted in the limit of high target concentration for a fixed probe type, a second 
saturation effect occurs when the number of target molecules is exhausted in the limit 
of high binding affinity at fixed target concentration. This limit was realised by including 
on a custom-built microarray a series of features of increasing probe length. 

In the current paper we extend the Ono model by identifying two types of target 
depletion, which we term "local depletion" and "global depletion" . By local depletion we 
mean that depletion of target molecules in the supernatant solution by a hybridisation to 
a given probe feature only affects that particular probe feature. This is essentially Ono 
et al.'s "finite hybridisation model". This regime is relevant when diffusion and/or 
convection of targets is slow compared with the hybridisation and probe features 
responsive to the same target species are spatially separated on the microarray. By 
global depletion we mean that all probe features responding to a given target species 
are mutually affected by depletion of that species from the supernatant solution. Global 
depletion is relevant for spatially separated features undergoing permanent agitation 
of the hybridisation solution, if equilibrium includes rapid diffusion of transcripts 
through the microarray cartridge, or for neighbouring features such as the perfect 
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match/mismatch (PM/MM) probes on the older designs of Affymetrix GeneChips. 

We fit the models to two spike-in datasets. The first, that of Suzuki et al. [30], which 
covers a broad range of spike-in concentrations and probe lengths and for which we verify 
that the local model, not the global model, is relevant, is dealt with in Section [21 The 
second, the U133A Affymetrix Latin Square data set [I], for which the global model is 
appropriate, is dealt with in Section [3j For this data set we find that the global model 
of depletion entails a substantial improvement on earlier reported fits by a hyperbolic 
response function [13] . As well as fitting response functions or so-called "isotherms" , we 
analyse the data sets in terms of the recently developed hook curve formalism [8], [11] 
designed for the calibration of microarrays whose design includes PM/MM pairs. The 
hook curve method turns out to be a clear and easily implemented indicator of which 
depletion regime, local or global, is relevant to a particular dataset. 

Full details of our local and global depletion models, including specific and non- 
specific hybridisation of target molecules to probes at the microarray surface and of 
targets within the supernatant solution and the folding of target and probe molecules, 



are set out in|Appendix A[ Some technical details of the analysis of the global depletion 



model are given in Appendix B 



Other than the work of Ono et al. and a related project [26], we are aware of 
only one other extensive attempt to incorporate target depletion from the supernatant 
solution during hybridisation into a physico-chemical model of microarrays, namely a 
recent publication by Li et al. [23]. In Section H] we give a critical evaluation pointing 



out a number of errors in the Li et al. model, with details given in Appendix C 
2. The Suzuki data set: an example of local depletion 



Suzuki et al. [30] have carried out experiments in which a set of 150 cDNA target 
sequences, with and without a complex background, are hybridised onto custom arrays 
containing features with probes ranging in length from i = 14 to 25 DNA bases. The 
probe designs include perfect matches and mismatches, the mismatches being in each 
possible position (1, ...,£) and of each possible nucleotide. Spike-in concentrations 
covered a broad range from 1.4 fM to 1.4 nM. The purpose of the experiment was 
to determine probe lengths and mismatch positions which optimise the discrimination 
between PM and MM signals. Because the spike-in concentrations run to very low 
values, depletion cannot be ignored [27] . Of the two physico-chemical models described 
in Appendix A we demonstrate below that this data set is an example of local rather 
than global target depletion. This result is reasonable: The large set of PM and MM 
probes addressing any one target species must extend over distances large compared with 
the nearest neighbour distance on the chip. The remaining question, which we settle 
below in favour of local depletion, is whether diffusion or convection of target molecules 
is slow (local depletion) or fast (global depletion) relative to the rate of hybridisation. 
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2.1. Theory 

For the case of local depletion, the coverage fraction < 9 < 1 of fluorescent dye 
carrying target molecules bound to a given probe feature at the microarray surface is 



shown in Appendix A.l to be 

X N + K s (x s - pOs) (1) 
1 + X N + K s {x s - P 9 s y 1 ' 

where x$ is the spiked-in probe-specific target concentration, p is an effective molar 
concentration of probe molecules immobilised on the microarray surface, X^, called the 
non-specific binding strength, is a dimensionless measure of the degree of non-specfic 
binding and Kg is an effective equilibrium constant for the binding of specific targets 
accounting for several chemical reactions including surface and bulk hybridisation and 
molecular folding. The coverage fraction 8s of specific targets only is given by Eq. (1A.31j) . 



The model of |Appendix A. 1| also allows for the consideration of post-hybridisation 
washing, which is signalled by differing responses of PM and MM features to saturation 
target concentrations [15]. There seems to be little evidence that washing is significant 
for this dataset (see Figure [3]), and for convenience we set the washing survival factors 
to unity in the current analysis. 

The log of the effective equilibrium constant Ks is expected to be approximately 
proportional to probe length. This follows from the definition Eq. flA.27j) and the 
relationship Kp$ oc e AG ^ RT ^ relating the hybridisation constant to free binding 
energy AG, which is well approximated by the SantaLucia nearest neighbour stacking 
model [28] . Ono et al. [27] make use of this result to consider isotherms relating coverage 
fraction to probe length, which we reproduce from the theoretical local depletion model 
in the left panel of Figure [TJ In calculating these curves we use an assumption 
that the ratio i^f M /i^^ M is independent of probe length. This is justified since 
K™/Kf M « gAAG/(i?,T) where AAG = ag pm _ ag mm iS) Qn averag6) independent of 

probe length by virtue of the nearest neighbour stacking model. 

Two saturation behaviours in the limit of large probe length, or high binding 
strength Xs = KsXs, are immediately apparent. From Eqs. flA.241) and flA.311) one 
obtains 

hm 0= lim (0s + N ) = \ \ (2) 

Xs^oo K s ^oo y X/p if X < p. 

In the case x > p, where the concentration of specific target exceeds the effective 
concentration of probes, the probes become saturated (6=1) and any residual unbound 
targets remain in solution. In the case x < p, where the probe concentration exceeds that 
of the targets, the free targets are completely depleted and the maximum fluorescence 
intensity decreases with decreasing target concentration [9 = x/p). Note also that for 
any PM/MM pair, the saturation intensity depends only on specific target concentration 
and not on the presence of mismatches. 

Also shown in the right panel of Figure [1] are the predicted hook curves for varying 
probe length in the case of local depletion. The hook curve method 0,|TT] was originally 



Physico-chemical modelling of target depletion 



5 



Isotherms and Hook Curve at fixed values of concentration 
(Local Depletion) 




Figure 1. Theoretical isotherms and hook curves derived from the local depletion 
model of |Appendix A.l| Each curve represents the response of the coverage fraction 
9 to variations of the specific binding strength for the PM probes, X$ M = Kg M xs 
at fixed specific target concentration xg. Since logKg M oc free binding energy of 
hybridisaton (see text), the horizontal scale in the isotherm plot can be thought of as 
a measure of probe length. Input parameters are: xs/p as indicated by colour in the 
legend; X N = 10 -3 and Kf M = 0.1A' PM . Isotherms for PM probes are plotted as 
solid lines, and for MM probes as dashed liincs. 



developed to analyse data from microarrays whose design includes PM/MM pairs, but 
can be applied to any pair of probe features addressing the same specific target. The 
method processes the PM/MM intensities J PM and 7 MM using the transformation 

A = log 10 J PM - log 10 J MM , S = \ <log 10 J PM + log 10 J MM > , (3) 

where, for Affymetrix GeneChips, the angular brackets denote averaging over probes 
within a probeset. Smoothing the A versus S plot provides a hook curve, whose 
characteristic shape typically assumes the concave downwards curve shown in Figure [TJ 
In previous implementations [HI ELL] the hook curve has been considered as a 
trajectory in the E-A plane as the specific binding strength X$ = K$xs varies due 
to changes in specific target concentration xs while the binding affinity K$ is held fixed. 
For the Suzuki data set we use a different and more appropriate implementation which 
specifically exploits the broad range of binding affinities arising from probe lengths which 
vary from 14 to 25 mer. That is, Figure [1] plots the hook curve as a trajectory traced 
out by varying binding affinity K$ at fixed values of target concentration xs- The left 
hand end of the hook curve (Xs = 0) is determined by non-specific hybridisation and 
will not vary significantly with probe length. The right hand end of the hook curve 
(Xs — > oo) is determined by the saturation intensity, and is expected to shift leftwards 
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6 = 1 , i V P,„ ~n \ ' ( 4 ) 



for subcritical specific target concentrations x < p. 

By contrast, the theoretical isotherms and hook curves for the case of global 
depletion are shown in Figure EJ For a given probe feature P, the theoretical isotherm, 
derived in Appendix A. 2 is now (see Eq. (1A.48|) ) 

X? + K£{xs-p6 s 

where 9 snm = J2p ® P i s the sum total of specific target coverage fractions over all 
probe features addressing the relevant target species, and is determined by an equation 
analogous to Eq. (1A.47I) . For illustrative purposes the curves in Figure [2] are calculated 
for the case of a single PM/MM pair of probe features addressing the target in question. 
Two differences with the local depletion case are immediately apparent. Firstly, since 
the depleted targets are shared among more than one probe feature, the asymptotic 
behaviour of the isotherms at subcritical concentrations differs between different probes 
addressing the the same target species (i.e. limx s ^oo #pm > limx s ->oo #mm) ■ Secondly, 
the shape of the hook curve remains unchanged as the probe length varies by the 
following argument. Since we use an assumption that iff / 'ifcf M is independent of 
probe length, one can think of the hook curve as being parameterised by the variable 
K$ M (xs — pO sum ) where sum has some functional dependence on xs, p, iff M and the 
fixed ratio iff M /if^! IM . Changing the value of xs then simply effects an identical 
reparameterisaton in this variable of both 6 PM and # MM . Individual points will migrate 
along the path of the hook curve, and at subcritical concentrations the curve will be 
truncated at the right hand end at different points, but otherwise the shape of the hook 
curve remains unchanged. 



2.2. Experiment 

The Suzuki spike- in experiment [30] includes spike- in runs of 150 cDNA target sequences 
both with and without a complex background. To keep the analysis simple we analyse 
only the data set without a complex background. The data set with complex background 
provides very similar results with respect to target depletion. 

In Figure [3] are plotted average logged intensities (iav.iog = 10^° 910 ^) over three 
replicates of the 150 sequences for PM and MM probes of varying lengths, the 
mismatches being in the central position of the probe. Thus each plotted data point 
is an average over 450 raw intensities. Some sort of averaging over probe sequences 
to account for the dependence of binding affinity on individual probe sequences was 
necessary in order to separate out the dependence on probe length. This is handled 
in the implementation of the hook curve for Affymetrix chips by correcting intensities 
with position- and nucleotide-dependent sensitivity profiles determined from intensity 
distributions over the whole array [IT] . However, this method cannot be used for the 
Suzuki data set as each array contains a range of probe lengths, making it difficult to 
define meaningful sensitivity profiles. The use of average logged intensities rather than 
averaged intensities was an appropriate and simple solution accounting for the fact that 
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Isotherms and Hook Curve at fixed values of concentration 
(Global Depletion) 




Figure 2. Theoretical isotherms and hook curves derived from the global depletion 
model of |Appcndix A.2| The input parameters and curve conventions are the same as 
for Figure [TJ Also shown in the right panel are the right hand end points of individual 
hook curves indicated by a + sign. As explained in the text, the shape of the hook 
curve remains unchanged as xs varies, except that the curve terminates at different 
right hand points at subcritical concentrations. Hook curves for all values of xs/p 
start at the same left hand point. Note that the critical value of specific concentration, 
below which the isotherm saturates at 9 < 1 now occurs at a value x cr { t /p > 1 as the 
depleted targets are shared among more than one probe feature. 



microarray data is generally observed to have multiplicative errors. Comparison of the 
resulting hook curves in Figure [3] with the theoretical hook curves in Figures CD and [2] 
shows clear evidence for local rather than global depletion. 

Also shown in Figure [3] are fits of a six parameter model to the 168 data points (12 
probe lengths x 7 concentrations x PM and MM). The model, based on the theoretical 
solution Eq. (JTJ for local depletion with = 0, is defined by 

C Aog = A + B9 p , P = PM,MM (5) 

where 8 P is the solution to 

nP KpeW-^jx - p8 p ) 

l + K P e x ^){x- P e p y [) 

i is the probe length and x the spike-in concentration. The parameters A and B 
account for the optical background intensity and saturation intensity respectively and 
the effective equilibrium constant in Eq. ([1]) is modelled by Kg = Kpe x( - £ ~ 20 \ The 
fitted parameter values are listed in Table [U The fitted value of the effective probe 
concentration p = 2.26 pM is consistent with the observations of Ono et al. [27] . 
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Figure 3. The Suzuki et al. data set without complex background. Left panel: 
Fluorescence intensities from PM (x) and MM in the central nucleotide position (+) 
probes obtained by taking average logged intensities of 150 spiked cDNA sequences. 
Fits to the model Eq.[5]arc shown as a solid curve (PM) and dashed curve (MM). Right 
panel: the corresponding hook curves and fits. The black hook curve corresponds to 
the fitted critical concentration x cr it = p = 2.26 pM. 

Table 1. Parameters fitting the local depletion model Eq. ([5]) to the Suzuki 
data set. 



Optical background intensity 


A 


31.7 


Saturation intensity above background 


D 


2.90 x 10 4 


Equilibrium constant of 20 mer PM probe 




0.500 PM- 1 


Equilibrium constant of 20 mer MM probe 




0.022 pm- 1 


Logarithmic length increment of K$ per nucleotide 


A 


1.02 


Bulk equivalent concentration of probes 


V 


2.26 pM 



3. The Affymetrix latin square data set: an example of global depletion 

Affymetrix have produced two well known data sets [1] from experiments in which 
RNA transcripts were spiked in at cyclic permutations of a set of known concentrations 
together with a complex background of cRNA extracted from human pancreas or human 
adenocarcinoma cell line and hybridised onto U95a or U133 GeneChips respectively. In 
a previous analysis [13] the U95a data set was shown to fit very well, and the U133 data 
set moderately well, to a physico-chemical model in which the target concentration was 
assumed not to be significantly depleted from the supernatant solution by hybridisation 



to the microarray surface. This model was the p = limit of the models in Appendix A 
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In this section we reanalyse the U133 data set and demonstrate that the global model 
of target depletion provides a significantly improved fit to this data. 

3.1. Theory 

The global model of target depletion is relevant to U133 Affymetrix GeneChips as the 
elements of a PM/MM pair of features are located in neighbouring locations on the 
microarray surface. Although each targeted gene is represented by 11 such probe pairs, 
we ignore depletion from other features within the same probeset as the design of the 
chip is such that those features are located elsewhere on the chip, and in general will 
target parts of the gene sequence further removed than the typical target fragment size 
of about 200 bases. 

The coverage fraction 6 P , P e {PM, MM}, of fluorescent dye carrying target 
molecules bound to the PM or MM feature at completion of the hybridisation step 
is given by Eq. gj where 6 sum = 0™ + 8 MM is found by solving Eq. fLA47j) . and and 
Kg are the non-specific binding strength and effective equilibrium constant for specific 
binding respectively. The loss of fluorescence intensity due to the post-hybridisation 
washing step cannot be ignored for Affymetrix GeneChips [IS, [21], [29] , and we introduce 
into our model specific and non-specific washing factors w P and respectively, where 

1 > w P > iv^f > 0. The post-washing coverage fraction is then given by Eq. (1A.49|) . 
Finally, the observed fluorescence intensity is 

I P = a + ^after.wash 

l + X p + K P (x s -p9 sum ) ' y J 

where a and b are the physical background and absolute saturation intensities, assumed 
to be constant across the entire microarray. 

3.2. Experiment 

For the purposes of comparing fits of the spike-in data to a null-hypothesis model without 
depletion (p = 0) and the one-sided alternate hypothesis with depletion (p > 0), we 
rewrite the model defined by Eqs. (j7|) and (IA.47P in the form 

where 9 snm (x; K PM , K MM ,p) is the solution in the physically relevant interval < 6> sum < 

2 to 

K p (x-p9 s 



P=PM,MM 1 + ^ ^ PUsnm) 

Here we have suppressed the subscript S on the PM-specific spike-in concentration xs 
and introduced the parameterisation 

A P =a + bw p -^- p , (10) 
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Figure 4. Fits of measured fluorescence intensities in .eel file units against spike- 
in concentrations in pM from a selected probeset of the spiked transcripts in the 
Affymetrix latin square U133 experiment to the 7 parameter model defined by Eq. JSJ. 
Note that a flattening of the PM isotherm and an inflection point in the MM isotherm, 
predicted in Section [3~3l to be a characteristics of target depletion in certain parameter 
regimes, is clearly visible for most of these probes. 



B P =b(w$- w Nj^pj , (11) 
K p 

K p = ^rp. P = PM,MM. (12) 

1 + X N 

Equations (jSJ) and Q define a 7 parameter model (A p , B p , K p , p) to which 
intensity data from a PM/MM pair of features for a range of spike-in concentrations x 
can be fitted. The p = case, corresponding to no significant target depletion, defines 
a 6 parameter model which was previously fitted to the U95a data in ref. [H] and to 
both the U95a and U133 data in ref. [15]. Below we use standard statistical methods 
to distinguish between a null hypothesis, p = 0, and alternate hypothesis p > 0. 

Fluorescence intensities for each of 11 probe pairs from each of 38 spike-in 
transcripts of the U133 latin square experiment were fitted assuming the data to be 
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Figure 5. Histogram of the fitted value of the parameter p (pM) for the all of the 
276 fits with physically meaningful parameter values (hatched bars). Also shown are 
histograms of the subsets corresponding to high and low values of the parameter A' PM . 
The low-A PM probes correspond to data lying to the left of the vertical dotted line in 
Figure [5] and are an approximation to the set of probes for which depletion data can 
also be fitted to a no-depletion model with an effective equilibrium constant given by 
Eq. {H. 



Gamma distributed with mean given by the model of Eq. (jSJ). The assumption of 
Gamma distributed data was used in previous analyses [H] to accommodate a constant 
coefficient of variation as expected for data with multiplicative errors, and is easily 
implemented using the function glm() from the statistical computing environment R [2]. 
Fits of the model to the data of one of the spiked transcripts are plotted in Figure HI and 
analogous plots for all 38 spiked transcripts are available in the supplementary material 
or at the web site of one of the authors [3]. 

Of the 418 probe pairs in the data set, 276 (or 66.0%) were successfully fitted to 
physically relevant values of the effective probe concentration restricted to the range of 
concentratons p > with physical values for the remaining parameters, i.e., Apm, -Bpm, 
Kp M , A MM , B MM and if mm all > 0. This should be compared with fits to the p = 
model without depletion in ref. p3], for which only 37.5% of probes were successfully 
fitted to PM/MM probe pairs. A histogram of the fitted values of the effective probe 
concentration parameter p is shown in Figure 
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Immediately noticeable is that the distribution is bimodal: a number of fits are 
simply the 'no depletion' solution at p — 0, while most of the the remaining cases cluster 
around p = 200 pM. To understand this, note that Eq. (JE]) makes clear that the effect 
of depletion is to reduce the true target concentration x to an effective concentration 

x cff = x - p9 sum , (13) 

where 6 sum is the sum of the PM and MM hybridisation fractions due to specific binding 
only, and is obtained by solving Eq. ([9]). In Figure[6]is plotted the corresponding effective 
binding strength K pu x c s against the true binding strength K PM x. One sees that, below 
a certain binding strength indicated by the horizontal dotted line K PM x e g = 1, the true 
concentration is reduced by a factor which is approximately constant over a range of x. 
In fact, from Eq. ([9]) we have that, for K PM x c s <C 1, i.e. the linear, low-concentration 
part of the isotherm, 8 sum ~ (K PM + K MM )x e s, from which it follows using Eq. ([TBI 
that x e ff ~ + (K PM + K MM )p\. It follows that any probe whose data points lie 
within this range will be fitted equally well by a hyperbolic, no-depletion, isotherm 
I p = A p + B p K p s x/(l + K p s x), with an underestimated equilibrium constant: 

= 1 + {K ™ P +K MMy P = PM > MM - (14) 

In Figure [5] we have partitioned the fitted values of p into those matching with 
high and low fitted values of the equilibrium constant, K PM ^ m 0.0195) pM -1 
respectively. The cutoff is chosen as a simple way to separate out an approximate 
set of probe pairs satisfying the conditions leading to the result of Eq. (114j) : Recall 
that the spike-in concentrations in the U133 experiment are bounded above by 512 pM, 
so for the low-i^ PM probes \og lQ K PM x < 1. That is, the fitted isotherms of these 
probes are determined solely from data lying to the left of the vertical dotted line 
in Figure [6] for which the curves relating logx to logx e fj are approximately straight. 
Returning to Figure El one observes that the high-equilibrium-constant isotherms, 
K PM > 0.195 PM" 1 , fit predominantly to the depletion model withp consistently around 
p = 200 pM, and the low-equilibrium-constant isotherms fit predominantly to the no- 
depletion model with, we infer, the fitted parameter K p underestimated according to 
Eq. (|T4l) . A rough estimate of the lower limit of the underestimation factor, assuming 
k mm < ^pm^ ig + 0.0195)- 1 x 200 w 0.2. 

For the subset of probe pairs which admit physically meaningful fits to both the 
alternate hypothesis model with depletion, and to the null hypothesis model without 
depletion, and for which the fitted value of p is strictly positive, we calculated one 
sided P-values under the null hypothesis assumption using the analysis appropriate to 
generalised models [21] described in detail in ref. [Hj. The histogram of these P-values, 
Figured shows that they are heavily bunched to the left: Depletion is confirmed at the 
5% confidence level for just over 60% of those cases for which the comparison could be 
made. 
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Figure 6. The relationship between the effective binding strength K x e g and true 
binding strength K PM x for a range of effective probe concentrations. The curves are 
calculated with the help of Eq. assuming x PM /K MM = 5, though in practice the 
shape of the curves is not very sensitive to this ratio. The horizontal dotted line is 
the upper limit of binding strengths for which depletion data can also be fitted to 
a no-depletion model with an effective equilibrium constant given by Eq. (|14[) . The 
vertical dotted line is the right hand limit of binding strengths determining the set of 
\ow-K PM probes in Figure [5] 



3. 3. Shape of the isotherms 

It is interesting to examine the shape of the isotherm fits in the global PM/MM depletion 
model to see how they they differ from the well known hyperbolic Langmuir form of 
the model without depletion, and from the isotherms of the local depletion model. It is 
convenient to define dimensionless quantities 

P _ 1 [X) A A 

On physical grounds we expect s > 1, which is observed in general in fits of spike-in 
data to models with and without depletion. Eqs. ([8]) and (Q become 

qPM _ K PM (x — pOsum) qMM _ K PM (x — pOsum) ,-.„■, 

- 1 + K PM {x _ p6snm) > ~ s + K PM {x _ p9snm) > I ) 

with # sum the solution to 

if PM (a:-^ sum ) K PM (x-p6 snm ) 
sum l + i^ PM (x-^ sum ) + s + K^(x-p6 sum y [ } 

Plots of P as a function of the dimensionless K PM x for the realistic value s — 10 
and a range of values of the dimensionless depletion parameter K FM p are shown in 
the right panel of Fig. [HI Also shown for comparison (left panel) are the equivalent 
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Figure 7. Histogram of the fitted P-values under the null hypothesis of no target 
depletion (p = 0) tested against the alternate hypothesis of global target depiction 
(p > 0) for each of those probe pairs which admit physically meaningful fits to both 
models. Just over 60% of cases fall within the 5% confidence interval (P-valucs < 0.05), 
favouring the alternative hypothesis. 



isotherms from the local depletion model, for which 6* sum in Eq. lTl6l) is replaced by PM 
or G MM respectively. The effect of depletion is to depress the response function at small 
specific target concentrations, as the available effective specific target concentration is 
effectively decreased. For the case of the PM/MM global depletion model, we show in 
Appendix B that for K PM p > (s — l)" 1 , and provided s > 1, the MM response curve 
acquires an inflection point, while the PM curve flattens without forming an inflection 
point. Physically, the effect of depletion on the MM response is more pronounced as 
the PM probes more strongly deplete the available target in solution. This behaviour 
is clearly evident in fits to the U133 spike-in data (see Figure H] and the supplementary 
material). A straightforward calculation shows that isotherms from the local depletion 
model, on the other hand, do not have an inflection point for either PM or MM probes 
for any parameter values. 

3.4- Shape of the hook curve 



Theoretical hook curves assuming either a local or global depletion model and parameter 
values typical of fits to the U133 Latin Square data set and a range of the probe density 
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Figure 8. Theoretical isotherms for PM and MM probes derived from local (left) and 
global (right) depletion models. 

The isotherms are scaled to dimensionless units K PM x, Q p = (I p — A p )/B p for 

s = 10 and various values of the dimensionless depletion factor K PM p. As explained in 

the text, in the global PM/MM model, the MM isotherms have an inflection point for 

K PM p > (s — 1) _1 , whereas the PM isotherms do not have an inflection point for any 

value of this parameter. Isotherms from the local model have no inflection point for 

any parameter values. Note that these isotherms are plotted at fixed values of binding 

constant K PM , whereas the isotherms in Figures [1] and [2] are plotted at fixed values of 

specific target concentration x, and consequently have different asymptotic properties 

as K PM x -> oo. 

parameter p are shown in Fig. [9j For these curves the trajectory is that of the pair (E, A) 
defined by Eq. ([3]) traced out as the specific binding strength X PM = K PM xs varies over 
a range of specific spike-in concentrations xs at fixed values of all other parameters in 
the model. For the case of global depletion the hook coordinates are calculated from the 
post-washing coverage fractions Eq. (1A.49|) with 9 sura given by Eq. (1A.47I) . For the case 
of local depletion 9 sum is replaced by 9 PM or 6 , | IM respectively defined by Eq. ( 1A.31I) . 

One sees that the effect of local depletion is to flatten the peak and introduce an 
asymmetry in the hook curve. The flattening is caused by a decrease in the difference 
between the PM and MM responses as more specific target is extracted from solution in 
the vicinity of the PM probe feature. Global depletion, on the other hand, has no effect 
on the shape or end points of the hook curve as it effects an identical reparameterisation 
Xs — > Xs — p9 sum in the formulae for both 9 PU and 9 MM . However, as p is increased, 
internal points corresponding to a given probe-pair value of the binding strength migrate 
progressively to the left along the curve, reflecting a decrease in both the PM and MM 
fluorescence intensities. 
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Figure 9. Theoretical hook curves determined from isotherms of a PM/MM pair 
assuming local depletion (left) and global depletion (right) for a range of the effective 
probe concentration parameter p (pM). The following parameter values, typical of 
fits to the U133 Latin Square spike-in data set, were used: -X"™ = X^f M — 10 -3 , 
5 x 10 -3 pM -1 , K¥ M = 5 x 10~ 4 pM -1 , and washing survival fractions 



k pm 



w ™ = to™ = 0.1, w^ M = 0.5, wf M = 0.2. Note that that for global depletion the 
shape of the hook curve is independent of p. 



< 




Figure 10. Experimental hook curve of one array of the U133 Latin Square data set 
and a fit using assuming the hyperbolic Langmuir response function without depletion. 
Note the symmetric shape of the experimental hook curve, compatible with global 
depletion. The deviation between the experimental and theoretic curve at small E is 
caused by non-specific hybridisation not discussed here (see [51 fTTjL 
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A typical hook curve from one of the arrays of the U133 Latin Square data set 
using the algorithm at ref. [I] is shown in Fig. [101 This algorithm includes a moving 
average over ~ 100 probesets and correction of raw intensities for probe binding affinities 
using position- and nucleotide-dependent sensitivity profiles [11]. Hook curves have 
been similarly evaluated for a number of experimental datasets relating to Affymetrix 
GeneChips in ref. [8], including the Latin Square spike-in experiments, with the result 
that no evidence for an asymmetric hook curve has yet been observed. We conclude, 
within this particular set of data sets corresponding to chip designs with neighbouring 
PM/MM pairs, that target depletion, if significant, fits the global model rather than 
the local model. 



3. 5. Correction of expression estimates 

Estimates of expression levels using algorithms such as the hook method [8] and the 
inverse Langmuir method [25] have to date ignored target depletion and therefore been 
based on the assumption of a hyperbolic Langmuir isotherm. In the previous section we 
have seen that this is equivalent to underestimating the true specific target concentration 
%s by a shift x$ — > x c q = x$ — pO S um, where 9 snm is the sum of the PM and MM 
hybridisation fractions due to specific binding only. 9 smn can be calculated from the 
observed total coverage fractions 8 PM and # MM ; which include both specific and non- 
specific binding, as follows: 
From Eq. ([QTjl . 



i/PM„ 7>-MM™ 
f\g ^cff A g X eff 

'PM^ 1 i V i I^MM, 



l + X N + iq M x cS l + X N + K^ M x c s x ' 

where the nonspecific strength = X™ m XfJ- M is assumed to be common for all 
probes on the microarray after correction for binding affinities via sensitivity profiles. 
Atv can be measured from the width of the hook curve and is typically of order 10~ 3 . 
From Eq. (IXig]) . 



X N + Kg x eS 

1 + X N + K§X eS 

which rearranges to give KgX e s = 9 P /(1 — 9 P ) — Xm- Substituting back into Eq. (CO 
then gives # sum = (1 + A^)(^ PM + 6 MM ) - 2X N . 

Thus, the true specific target concentration is given in terms of the effective, 
depleted target concentration by 

x s = x cS +p [(1 + A^)(^ PM + 9 UM ) - 2X N ] . (20) 

In principle, this formula gives the correction for target depletion over the entire range of 
target concentrations, including an interpolation between the two regimes illustrated in 
Figure [6j Note that x c fj, X N and the coverages 6p can be estimated by established 
methods such as hook curve or inverse Langmuir method. Eq. fTSOj) then requires 
knowledge of the probe concentration p, which, for example, is expected to depend 
on the chip type. Its estimation requires further efforts which will be the subject of 
future investigations 
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4. Critical evaluation of an alternate hybridisation model 

Recently an alternate competitive hybridisation model incorporating target depletion 
by Li et al. (23] has appeared in the literature. This model is applied to the Affymetrix 
U133 data set and is purported to be capable of predicting signal intensities of individual 
probes and of achieving quantification of absolute target concentrations from microarray 
fluorescence intensity data. Here we point out a number of errors in the basic 
assumptions of Li et al.'s model and argue that it does not represent any advance 
over previously existing hybridisation models. 

Of particular interest to Li et al. is the asymptotic behaviour of fluorescence 
intensities for individual probes in the limit of saturation concentrations of specific 
target. Standard reaction kinetic models applied to the hybridisation step of the 
Affymetrix protocol implies that in the high specific target concentration limit, all 
probes should saturate at the same observed fluorescence intensity, regardless of the 
nucleotide probe sequence or resulting probe-target binding free energy. For either of 



the models in Appendix A for instance, we have lim^g^oo 9 = 1, where the limit is 
taken with other variables being held constant. This is at variance with observations 
from spike- in experiments, for which the PM element of a PM/MM almost invariably 
saturates at a higher intensity than its MM partner. 

An acceptable explanation, which has been demonstrated to fit well the saturation 
behaviour to both the U95a and U133 Affymetrix spike-in experiments [T5| [2Tj . is to 
explain the differing asymptotes via the post-hybridisation washing step, which not 
only removes unbound targets, but also dissociates both specific and non-specific bound 
targets (see Eq. flA.491) ). For reasons which are not clear, but which appear to be based 
on a misinterpretation of Skvortsov et al.'s experimental results [29], Li et al. reject 
the washing hypothesis. Instead, they proceed to develop their own thermodynamic 
model, which is not consistent with accepted principles of physical chemistry, but which 
nevertheless predicts response functions with binding free energy dependent asymptotes 
resulting from the hybridisation step alone. In their model, the washing step is assumed 
to have little effect on specific targets bound to probes. 



In Appendix C we explain in detail a fundamental error in their application of the 
law of mass action to hybridisation at the microarray surface, and show that when the 
error is corrected, their model essentially agrees with existing treatments inspired by 
Langmuir adsorption theory, together with the depletion extension of Ono et al. [27J. 
We also note that their derived formula for the coverage fraction of specific targets is 
demonstrably wrong in that it disagrees with the results of the Affymetrix latin square 
spike-in experiments without complex background. Lastly, the algorithm proposed 
by Li et al. for inferring absolute specific target concentrations requires subtraction 
of the intensity at zero spike-in concentration as a way of dealing with non-specific 
hybridisation (see Eqs. (22) and (27) of ref [23]). This value is of course unknown 
in any biomedical application of microarrays, and it is the problem of calibrating a 
correction for non-specific hybridisation which is the subject of much current activity 
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in physico-chemical modelling of microarrays (see refs. [221 [B] for instance). That Li 
et al. are able to produce estimates of spike-in concentrations at the higher end of 
the scale (> 1 pM) by cross validation from a crude 4 parameter formula based on 
incorrect physical assumptions is not surprising and is not an improvement on any 
existing expression measure. 

5. Conclusions and outlook 

The physico-chemical models of equilibrium microarray hybridisation described here 
involve microarray probes, specific and non-specific targets and their interactions 
on the chip surface. As well as probe-target hybridisation, bulk hybridisation and 
probe and target folding, the important innovation is a careful consideration of 
depletion of target molecules from the supernatant solution by hybridisation of specific 
targets. Consideration of target depletion is important when the target concentration 
is comparable with or less than the effective probe molecule concentration, which we 
determine to be of the order of 200 pM for HG133 generation Affymetrix GeneChips. If 
the sensitivity of microarrays is to be pushed to lower specific target concentrations, a 
proper understanding of and appropriate correction for this phenomenon is important. 

Two possible scenarios are considered, local and global depletion. In the first 
scenario, studied by Ono et al. [27], depletion by hybridisation to a given probe feature 
only affects that particular feature. This scenario is relevant when probe features 
addressing the same target species are physically separated on the microarray, and 
the rate of diffusion or convection over the distance between features is small compared 
with the rate of hybridisation. The second scenario, global depletion, has not been 
considered previously. In this scenario some or all of the features addressing a given 
target species are effected. This is relevant, for instance, for chip designs which include 
mismatch features located in close proximity on the microarray surface to their perfect 
match partners. 

We analysed data obtained in two experimental situations: firstly, the intensity 
response of PM and MM probes of varying probe length at fixed target concentration 
(the Suzuki et al. data set), and secondly, the intensity response of PM and MM probes 
of fixed length at varying target concentration (the Affymetrix latin square data set). 
The PM/MM design of the chips allows for a combined analysis of both probe types 
via the "hook plot" , the shape of which gives a clear discrimination between local and 
global depletion. 

We have confirmed conclusively using the hook curve analysis that the spike-in data 
set of Suzuki et al. [30J is an example of local and not global depletion. A six parameter 
fit of the local depletion model verifies the earlier analysis of Ono et al. [27] . The hook 
curve analysis has proved particularly useful for this type of analysis because of the 
marked qualitative difference in the behaviour of these plots between the two possible 
scenarios. 

Previous attempts to fit a hyperbolic Langmuir isotherm model to the second data 
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set, the Affymetrix U133 latin square spike-in, had only met with partial success [13J. 
In our current reanalysis of this data set we have had markedly improved success using 
the global model of target depletion, which we believe is relevant because of close 
proximity of partner PM and MM probe features. The global depletion model provides 
a significantly improved fit to a large portion of these data, namely that portion for 
which the effective equilibrium constant of the hybridisation reaction is above a certain 
threshold value. Importantly, we have demonstrated that if the effective equilibrium 
constant K is below the inverse of the range of concentrations of a spike-in experiment, 
the ability to detect target depletion through response curve fits is masked and the 
data may mistakenly be fitted to the linear part of a non-depleted hyperbolic Langmuir 
isotherm with an underestimated equilibrium constant given by Eq. (I14p . 

For the Affymetrix spike-in data our depletion model is also able to explain 
certain qualitatively observed phenomena related to the shape of the isotherms. The 
MM response function typically has an inflection point at low concentrations which 
may serve as a signal for global depletion in spike-in experiments, whereas the PM 
response function is typically flattened but does not form any such inflection point. 
Another characteristic of global depletion is the shape of the hook curve which 
continues to be symmetric as the effective concentration of free specific targets is 
reduced by hybridisation. Local depletion, on the other hand, is predicted to entail 
an antisymmetric hook curve. 

In the final section we have given a critique pointing out a number of serious errors 
in a competing physico-chemical model dealing with target depletion in microarray 
hybridisation experiments by Li et al. After correction of these errors one gets a solution 
which, in the limit of no depletion, is the well established and accepted Langmuir model. 
With depletion included it is a simplified version of our local depletion model or the 
model of Ono et al. [27]. 

The observations made herein, particularly those for the Affymetrix U133 data 
set, have consequences for existing physico-chemistry-based algorithms and methods 
for microarray calibration. By calibration we mean obtaining estimates of transcript 
abundance, ideally as an absolute concentration or, at the very least, relative measures 
which are related linearly to transcript abundance. It must include not only systematic 
correction for the effects such as non-specific background, saturation and sequence- 
specific binding affinities of probes [12], but also, as we have shown, depletion of targets 
from the supernatant solution. 

Physico-chemical calibration algorithms rely directly or indirectly on obtaining 
estimates of the effective equilibrium constant K from probe sequences via position 
dependent affinities 0, [10] or via free binding energies AG calculated from nearest 
neighbour stacking models [T71 [T8] . To date they have assumed a hyperbolic Langmuir 
isotherm and involve fits to spike-in data sets including the Affymetrix HGU133 data 
set. We have shown here that estimates of K from this data set are compromised 
in a predictable way by target depletion if a hyperbolic isotherm is assumed. It is 
consequently not surprising that attempts to find a clear and unambiguous relationship 
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between K obtained in this way and AG have met with limited success (see Section 
5.2 and 5.3 of [13] )• Clearly more work has to be done in correcting this aspect of 
calibration algorithms to take into account target depletion. Finally, irrespective of 
whether calibration algorithms rely on inverting a theoretical isotherm [25] or first 
extracting an effective binding strength X e s = Kx e s from, say, the hook curve |12j . 
a solution must be found to the problem of extracting the true target concentration 
x from the microarray-depleted concentration x e Q. In Section 13.51 we show that the 
information required to do this is, in principle, inherent in the measured fluorescence 
intensities via Eq. (|20|) . A practical implementation of this will be the subject of future 
work. 
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Appendix A. Physico-chemical model 

In the physico-chemical model presented below the equilibrium coverage fraction 9 
(0 < 9 < 1) of fluorescent dye carrying target molecules bound to a given probe 
feature at the microarray surface at the end of the hybridisation step is calculated 
assuming standard equilibrium physical chemistry. The model differs from previous 
models considered by the current authors [SI EE] in that the Ono model [27] of target 
depletion from the supernatant solution by hybridisation to the array is included. Two 
regimes are considered: 

The first of these, local depletion, in which depletion by a given probe feature 
only affects that particular probe, is a slight variant of the "finite hybridisation 
model" including competitive specific and non-specific hybridisation presented by Ono 
et al. [27J. It differs from the Ono model in that all chemical reactions, viz. folding, 
bulk hybridisation and surface hybridisation, are integrated ab initio, leading to slightly 
different formulae for the final coverage fraction. A detailed derivation of local depletion 
is included here for completeness and to establish the notation and a framework for the 
second regime, global depletion, in which all features responding to a given target species 
are affected. 

Appendix A. 1. Local Depletion 

In this case there is assumed to be no interaction between different probe features. The 
set of chemical species considered is set out in Table IAT1 For a given probe feature, the 
input parameters to the model are (1) the total specific target concentration, 

Xs = [S] + [S'} + [P.S] + [S.N] + [S.S], (A.l) 
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(2) an effective total non-specific target concentration, assumed to be common to all 
probe features: 

x 7v = [N] + [N'l + [P.N] + [S.N] + [N.N] , (A.2) 

(3) an effective probe concentration for the feature: 

p=[P] + [P'] + [P.S] + [P.N], (A.3) 

and (4) a set of equilibrium constants K r , where r £ {Sfold, Nfold, . . .}, for the reactions 
(1A.5|) to (1A.12I) set out below. Following the usual convention square brackets indicate 
the molar concentration of a chemical species. 



Table Al. Chemical species present in the model. 





unfolded folded 


specific target in solution 


S S' 


non-spec, effective target in solution 


N N' 


probe at surface (not bound to target) 


P P' 


duplexes in solution 


S.S, S.N, N.N 


duplexes at microarray surface 


PS, P.N 



Our aim is to determine the total coverage fraction 

p p 

of both specific and non-specfic duplexes resulting from the following chemical reactions: 
Folding 

S^S': [S ; ] = K SMd [S]. (A.5) 

N^N': [N'] = K NMd [N]. (A.6) 

P-P': [P'] = K Piold [P]. (A.7) 
Bulk hybridisation 

S + N^S.N : [S.N] = K S x[S][N]. (A.8) 

S + S^S.S: [S.S] = K SS [S] 2 . (A.9) 

N + N ^ N.N : [N.N] = K NN [N] 2 . (A.10) 

Surface hybridisation 

P + S^±P.S: [PS] = K PS [P][S]. (A.ll) 

P + N ^ P.N : [P.N] = K PN [P][N]. (A. 12) 

We begin by using Eqs. (I A. 51) to flA.121) to eliminate concentrations of folded species 

and and of most duplex species from Eqs. flA.ll) to (]A.4|) . From Eqs. flA.31) and (IA.4I) 
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we obtain 

n = K PS [S] 

S (1 + K PioM ) + K PS [S] + K PN [NY { ' ' 

Q = ^pn[^] (AU) 

N (l + K P{old ) + K PS [S}+K PN [NY y ' J 

and from Eqs. flA.ll) and (IA.2I) . 

x s = (l + infold + Ksv[N}) [S] + A^S] 2 + [P.S], (A.15) 

x N = (1 + K Nfold + K SN [S]) [N] + A- NN [iV] 2 + [P.N]. (A.16) 
Following ref. [6] we make the reasonable assumptions 

(i) A"ss[5'] << 1: Specific targets will not easily encounter each other in bulk solution; 

(ii) A'sn^] << f: Very little of the depletion of nonspecific targets by bulk 
hybridisation is due to encounters with the specific targets in question; 

(iii) [P.N] << x^'. The proportion of nonspecific background depleted by hybridisation 
to the microarray is negligible. 

With these assumptions, the above equations reduce to 

x s = (l + K SMd + Ksn[N]) [S] + [P.S], (A.17) 

x N = (1 + K mold ) [N] + A' NN [Af • (A.18) 
Eq. (1A.18I) is a quadratic in [N] whose solution we will write as 

[N] = f N (x N , Ann, # Nfold ). (A.19) 
Previously (Eq. (2.8) of [5] and Eq. (1) of [6]) the following approximation 

[N] « — (A.20) 

1 + ANfold + Ann^TV 

has been used, though this approximation is not necessary in the current context and 
is only included for comparison with previous work. We also have 

xs-[PS] 



-f^Sfoid + K SN f N (xN, Ann, -^Nfoid) 
x s - [PS] 



1 + A'sfold + K s ^x N 
once again employing the same approximation. 

Substituting back into Eqs. (1A.13|) and (1A.14|) gives 



(A.21) 
(A.22) 



where 



K s^s-[P-S]) 

Bs ~ i + x N + K s (xs-[p.s]y (A - 23) 

9n = i + x N + K s N (x s -[p.s]y (A - 24) 

Xn = : ; ™ — In(xn, A"nn, A" N foid) (A. 25) 
A + Apf i d 



K PN x N 



(1 + A"p fo i d ) (1 + ANfoid + K NN x N ) 



(A.26) 
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and 



(1 + A Pfold )(l + A Sfold + K SN f N (x Nl A NN , A Nfold )) 



K 



PS 



(A.28) 



(1 + A" Pfold )(l + ATsfoid + K SN x N ) ' 
Finally, using Eq. (|A.4j) . gives 

9s = KS {XS ~ P6S) r, (A.29) 

l + X N + K 8 (xs-p6 8 y 1 ; 

6n = i + x N + K N s ( Xs - P 9 s y (A - 30) 

The quantity Xn is known as the non-specific binding strength, and in the 
approximation of Eq. (IA.26I) is often written in the form Xm = Kj^xn where Kn is 
an effective equilibrium constant for non-specific binding. It is also common to define 
a specific binding strength X$ = K$xs in terms of the effective specific equilibrium 
constant Kg and specific target concentration. 

For given x$, %n, P and equilibrium constants K r , Eq. flA.29j) is a quadratic in 9$ 
with a unique solution in [0, 1], namely 



fc = i i±^ + i + ^-./fi±^ + i + ££) 2 -4^ 

2 K s p p V V K sP P J P 

The required result is then 



(A.31) 



9 = 9 S + 9 N = / N v +Ks J X r P9 iv (A.32) 
l + X N + K s (x s -p9 s ) 

If post-hybridisation washing is significant, it is introduced into the model via specific 

and non-specific survival factors ws and w^, where 1 > ws > > 0, giving 

a a , a w sX N + w N K s (x s - p9 s ) , . QQA 

t7aftcr.wash — WS^S + ^N^N — ~ ~ , ~ , TT^T ■ [A.66) 

1 + X N + K s (x s - p9 s ) 

Appendix A. 2. Global depletion 

In the case of global depletion the target concentration specific to a given feature is 
assumed to be depleted by the hybridisation to all features which target the same 
chemical species. Below we consider the case of a PM/MM pair of probe features, though 
the analysis readily generalises to any number of features addressing the same specific 
species. We use superscripts PM and MM to indicate probe molecules on respective 
elements of a PM/MM pair, and denote by S the target species complementary to the 
PM probe. With these changes the set of input parameters become (1) the total specific 
target concentration 

Xs = [S] + [S'} + [P PM .S] + [P MM .S] + [S.N] + [S.S], (A.34) 

(2) an effective total non-specific target concentration 

XN = [N] + [N'\ + [P PM .N] + [P MM .N] + [S.N] + [N.N], (A.35) 
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(3) an effective probe concentration, assumed to be the same for PM and MM, 

p = [P PM ] + [P PM '} + [P PM .S] + [P PM .N] 

= [P MM ] + [P MM '] + [P MM .S] + [P MM .N], (A.36) 

and a set of equilibrium constants K^, which may or may not depend on P = PM, MM, 
depending on the reaction r. Our aim is now to determine a coverage fraction 

e p = e P + e^ = ^-^+ [ ^-^, p = pm,mm (A.37) 
p p 

for both elements of a probe pair. 

Analogous to Eqs. (1A. 13|) and (1A.14j) we have 

^ ( l + ^)SuM ' P — PM, MM, (A.38) 

^ = (i + *^)?S] + ™ mm ' < A39 > 

After making the 'reasonable assumptions' of the previous section, Eq. (1A. 1 7[) becomes 
x s = (l + tfsfoid + Kss[N]) [S] + [P PM .S] + [P MM .S], (A.40) 

and Eqs. flAlSl) and ([091) remain unchanged. Then Eqs. ( IA.23I) and ( 1A.24I) become 
op _ KP s (fs ~ [P PM S] - [P MM S]) 

3 1 + X p + K p (x s -[P PM .S]-[P MM .S}Y 1 ' } 

N l + XP + KP(xs-[P PM .S}-[P MM .S}) , 1 ' } 

where 

p K PN 

^ N = 1 Z KP f N ( XN ' -^NNj -^Nfold); (A. 43) 

and 

K p 

j^P _ ps ^\ 

(1 + -Kp fold )(l + A' Sfold + K SN f N (x N , if NN , A' Nfold )) 

Using Eq. ()A.37j) then gives 

[x s - p{e^ M + MM ^ 
i + x p N + At [x s - p(e™ + ef M )] 



nF _ "a J-v-a 1 "_S )\ /a .r\ 

i , , tsP r„ „//iPM , /iMMM ' (rv.'iuj 



^ = [s s -p(0™+or (A ' 46) 

Summing Eq. ( 1A.45I) over P and defining # sum = 8 PM + 6 I ^ [M , gives 
a ST K p (x s -p6 sum ) 

P=PM,MM + N+ S P Vs *™) 

This equation is cubic in 6 sum , and can easily be solved numerically as a function of xs, 
K P , X^j and p using a Newton- Raphson algorithm. The required coverage function is 
then 

e p = x N + ^(xs- P 9 sum ) = 

l + X£ + A^(x s -p0 suin )' 1 ; 
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Again post-hybridisation can be introduced into the model via specific and non-specific 
survival factors, giving 

Wg X£ + w^Kg (x s - pOsum) 



e 



after. wash 



l + X? + K§(x s - P 8 s 



P = PM, MM. 



(A.49) 



Appendix B. Analysis of the shape of the isotherms 



We demonstrate that in the global PM/MM depletion model with s = K PM /K MM > 1 
considered in Section 13.31 the MM response curve acquires an inflection point for 
sufficiently high values of K PM , while the PM response curve flattens without forming 
an inflection point as K PM increases. 

Defining 0(x) = K PM (x - p9 sum ), Eq. (JED gives 9 PM = 0/(1 + 0) and 9 MM = 
0/(s + 0), and thus 

d 2 Q PM 0" (<J)') 2 d 2 Q MU set)" id)') 2 . , 



dx 2 (1 + 0) 2 (1 + 0) 3 ' 

while differentiating Eq. (fTTj) twice gives, 



dx 2 



+ 



s + 



2s- 



-2s- 



f\2 



P K PM (1 + 0) 2 (1+0) 3 ' (S + 0) 2 

One easily checks that 0(0) = 0, and thus Eq. ( 1B.2I) implies 



0"(O) 



2(1 



-0'(O) 2 . 



s(l + s + s/(pK PM )) 
Substituting back into Eq.l lB.ll) at x = gives 

d 2 e™ 

x=0 

= 2 



(B.2) 



(B.3) 



dx 2 



0"(O) - 20'(O) 5 



s(l + s + s/(pK PM )) 

( - ^'(o) 2 



1 0'(Of 



8 (1 



1 



-0'(O) 2 < 0, 



--(1+.)-' ' (R4) 

for s > 1. That is, the PM response curve is concave downwards at the origin for all 
physically relevant values of s. In fact d 2 Q PM /dx 2 \ x= o increases from — 2(i^ PM ) 2 at p = 
to as p — ► oo and hence the response curve flattens to an almost straight line. 
Similarly we have 



2aMM 



d 2 e 



dx 2 



-0"(O) - -0'(O) 2 



x=0 



s 
2 



1 + s 2 



s + s/(pK PM ) 



1 0'(O) 2 , 



(B.5) 
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from which it follows that 

d 2 e MM 



dx 2 



according as pK 



PM <- 



(B.6) 



x=0 



Thus, the MM response curve has an inflection point for pK > l/(s — 1). 



Appendix C. Critique of Li et al. 

We point out errors in the thermodynamic model proposed in a recent paper by Li et 
al. [23]. The primary source of error in this paper is an incorrect use of the law of 
mass action in Eq. (3) of their paper describing the rate h- m of binding of specific and 
non-specific targets to probes. In the notation of Li et al., the corrected form of the 
equation is 



(l-a-P)pk b ([T] + [N}), (C.l) 

where a and (3 are specific and non-specfic coverage fractions (equivalent to our 9s and 
9n), p is the effective probe concentration, [T] and [N] free specific and non-specific 
target concentrations, k b the reaction rate for binding (assumed to be determined by 
a rate-determining initiation step and therefore the same for specific and non-specific 
targets), Na is avogadro's number and V volume of the hybridisation solution. The 
factor ([T] + [N]) is missing from Li et al.'s paper, either intentionally or through an 
oversight, but must be present if the reaction proceeds at a rate proportional to the 
product of the concentrations of each of the reactants. 

With this correction, Eq. (5) of ref. [23] balancing the forward and backward 
reaction rates becomes 

(1 - a - f3)pk b ([T] + [N]) = apk d + (3pk n , (C.2) 

where kd and k n are dissociation rate constants for specific and non-specific duplexes 
respectively. Eq. (8) of ref. [23] is best derived by balancing the forward and backward 
rates for specific and nonspecific targets separately: 

• (T) . (T) 

77 77 

(1 - a - B)ph\T] = j^-y = = apk d , 

AN) -(AT) 



77 77 

(1 _ a _ /3)j)MJV] = ^_ = _^_ = aA , 

f) = m a ' (a3) 

in agreement with Eq. (8) of ref. [23]. In fact this equation cannot be derived without 
the assumption that the forward reactions are driven at rates proportional to the target 
concentrations, as used in Eq. fIC.ll) above, but not in Eq. (3) of ref. [23]. Substituting 
this back into Eq. ( 1C.ll) gives the corrected form of Eq. (9) of ref. [23], 



giving 



„ = ! = 'SiB ( C4) 

1 + {ki/K)(\N\/[T}) + (k d /h)(l/[T]) 1 + K T [T] + K N [N] ' 1 ' 
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where we define specific and non-specific equilibrium constants = k^/kd and 
Kn — kb/k n respectively. A similar calculation gives the non-specfic coverage fraction 
as 

3 - K »W (C5) 

P 1 + K T [T] + K N [NY { } 

Li et al. incorporate target depletion by hybridisation from the supernatant solution 
by making the substitution [T] = T — ap, where T is the nominal spike-in concentration. 
With appropriate changes of notation, the corrected equations (IC.4I) and flC.5j) with this 
substitution are essentially nothing more than simplified versions of the Ono model [27J, 
or of our local depletion model Eqs. (1A.29I) and (1A.30I) . without inclusion of probe or 
target folding or bulk hybridisation in the supernatant solution. Li et al. then proceed 
to fit their model to the U133 Affymetrix latin square data set. However, the above 
substitution corresponds to local, not global, depletion, which we have demonstrated in 
Section [3] is not appropriate for this data set. 

Finally we note that Li et al.'s Eq. (12) for the specific coverage fraction (the 
corrected form of which is Eq. ( 1C.4I) ) , namely 

a = s [sic], (C.6) 

l + k d [l/k b + 1 /{T-ap)} 

where 7 = (l/k n + l/kb)[N], cannot be correct by the following reasoning. In the 

absence of a non-specific complex background ([N] — > 0, and thus 7 — * 0), this equation 

predicts that the coverage fraction should be independent of spike-in concentration 

(a — > 1/(1 + kd/kb)), and indeed equal to their predicted binding affinity dependent 

saturation coverage over the whole range of spike-in concentrations T. This is obviously 

wrong, as evidenced by a version of Affymetrix's U95a latin square spike-in experiment 

without complex background [13] in which the experimentally obtained coverage fraction 

clearly responds to target concentration. 



Glossary 

Hybridisation. The reversible chemical reaction by which target molecules in solution 
bind to probes attached to the microarray surface to form duplexes. 

Microarray. A high-throughput device for detecting the presence of large biological 
molecules (DNA, RNA or proteins) of specific known letter sequences via their 
binding to molecules of complementary sequences attached to a solid surface. They 
are high-throughput in the sense that large numbers of sequences are tested for in 
a single device. The microarrays discussed here are oligonucleotide gene expression 
microarrays, that is, they have short DNA probes and are intended for the detection 
of expressed genes through their messenger RNA. 

Non-specific hybridisation. The hybridisation of target molecules with sequences other 
than those of the intended sequence. When dealing with microarrays with a 
PM/MM (perfect match/mismatch) design, 'non-specific' is used to mean 'non- 
PM-specific', that is, hybridisation of target molecules which are not complementary 
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to the PM sequence, irrespective of whether they are binding to the PM or MM 
member of a probe pair. 

Perfect match/Mismatch probes. (Conventionally abbreviated as PM and MM.) A 
common design in Affymetrix GeneChip microarrays is to represent each targeted 
nucleotide sequence by two neighbouring probe features: the PM, whose DNA 
probe sequence is exactly complementary to the target sequence, and the MM, 
whose DNA sequence is identical to the PM sequence except that the base in the 
central position of the probe sequence is replaced by a base complementary to that 
in the PM sequence. The idea behind the MMs is that they should respond to 
non-specific targets in a way similar to their PM partner, and can be used as a way 
of controlling biases due to non-specific hybridisation. 

Probe. A biological molecule attached to the microarray surface during fabrication. 

Spike-in experiment. An experiment in which known concentrations of a specific set of 
target molecules are artificially added to a solution not otherwise containing those 
specific targets, and the solution hybridised onto microarrays. 

Target. A biological molecule in the solution hybridised onto the microarray during a 
laboratory experiment. 
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http: / / www.affymetrix.com/ support / technical/ sample_data/ datasets. affix 



http://cran.r-project.org/[~ 



http: / / wwwmaths.anu.edu.au/ cbis/~conrad/ Spike in Jsotherms / 

http://www.izbi.uni-leipzig.de/engli sch/downloads Jinks/programs/hook.php?gro) qp=links. 
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